Skip to content

Add source_software auto-detection for dataset loading#920

Open
M0hammed-Reda wants to merge 2 commits intoneuroinformatics-unit:mainfrom
M0hammed-Reda:feat/auto-source-software-inference
Open

Add source_software auto-detection for dataset loading#920
M0hammed-Reda wants to merge 2 commits intoneuroinformatics-unit:mainfrom
M0hammed-Reda:feat/auto-source-software-inference

Conversation

@M0hammed-Reda
Copy link
Copy Markdown

@M0hammed-Reda M0hammed-Reda commented Mar 21, 2026

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

Loading data with load_dataset() currently requires users to pass source_software explicitly, even when the file format is distinctive enough to infer it automatically. This adds boilerplate to common workflows and makes the API a bit less convenient for users who just want to load a supported file.

At the same time, some DLC-style CSV files can match both DeepLabCut and LightningPose, so inference should be helpful without silently guessing in ambiguous cases.

What does this PR do?

  • adds infer_source_software(file) to infer the source software from the input file
  • adds support for source_software="auto" in load_dataset()
  • makes load_dataset() default to automatic source inference
  • uses registered validators to infer supported formats
  • handles the DeepLabCut vs LightningPose CSV overlap with a small heuristic
  • raises a clear ValueError when a DLC-style CSV is genuinely ambiguous instead of silently defaulting
  • exports infer_source_software from movement.io
  • adds unit tests covering supported formats and an ambiguous DLC-style CSV case
  • updates the user guide to document automatic inference and ambiguous CSV behavior

References

Closes #919

How has this PR been tested?

This PR was tested locally with:

pytest tests/test_unit/test_io/test_load.py

This covers:

  • direct testing of infer_source_software()
  • loading via load_dataset(..., source_software="auto")
  • supported formats including DeepLabCut, LightningPose, SLEAP, Anipose, VIA-tracks, and NWB
  • the ambiguous DLC-style CSV case, which now raises a clear error

I also ran the repository pre-commit checks as part of committing the changes.

Is this a breaking change?

  • Yes
  • No

Existing explicit uses of load_dataset(..., source_software=...) are unchanged. This PR adds automatic inference as a convenience feature. In ambiguous DLC-style CSV cases, inference now raises a clear error instead of silently guessing, but that behavior only applies to the new auto-inference path.

Does this PR require an update to the documentation?

  • Yes
  • No

The user guide has been updated to:

  • describe automatic source software inference
  • show usage with source_software="auto"
  • explain that ambiguous DLC-style CSV files may still require an explicit source_software

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add automatic source_software inference to load_dataset

1 participant